Conversation
- Introduced support for 2Fetch features in the branch predictor and Fetch Stage. - Enhanced the FetchTargetQueue to manage next FTQ entries for 2Fetch functionality. - Now if 2 fetchBlock in the same 64 byte fetchBuffer, it could be 2 fetched at the same cycle. Change-Id: I3b112cc844c485d81cea1f4ed0ff221cb37d2782
📝 WalkthroughWalkthroughIntroduces a "2Fetch" feature for the decoupled branch predictor (lookahead into the next FTQ entry) and centralizes per-cycle fetch stopping into Changes
Sequence Diagram(s)mermaid Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
📜 Recent review detailsConfiguration used: defaults Review profile: CHILL Plan: Pro 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
🔇 Additional comments (1)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
src/cpu/pred/btb/fetch_target_queue.cc (1)
287-293: Minor: Redundant map lookup inpeekNext().The method performs the same
ftq.find(fetchDemandTargetId + 1)lookup twice - once in theassert(hasNext())call and once explicitly. Consider caching the result or using a single lookup.♻️ Suggested optimization
const FtqEntry& FetchTargetQueue::peekNext() const { - assert(hasNext()); - auto next_it = ftq.find(fetchDemandTargetId + 1); + auto next_it = ftq.find(fetchDemandTargetId + 1); + assert(next_it != ftq.end() && "peekNext() requires hasNext()"); return next_it->second; }src/cpu/pred/btb/decoupled_bpred.cc (1)
167-167: Formatting issue: Misaligned closing brace.The closing brace appears to have lost its indentation from surrounding code.
♻️ Suggested fix
for (int i = 0; i < numStages; i++) { predsOfEachStage[i].btbEntries.clear(); -} + }src/cpu/pred/btb/decoupled_bpred.hh (1)
1013-1027: Consider encapsulation:has1Fetchedis internal state that should be private.The
has1Fetchedflag is internal state tracking whether the first FTQ has been processed in a 2Fetch cycle. It's currently in the public section along withenable2FetchandmaxFetchBytesPerCycle, but unlike those configuration parameters,has1Fetchedis runtime state that shouldn't be accessed externally.♻️ Suggested reorganization
Move
has1Fetchedto the private section near other internal state variables, while keepingenable2FetchandmaxFetchBytesPerCycleas public configuration parameters:+ private: + // 2Fetch internal state + bool has1Fetched{false}; ///< Whether first FTQ has been fetched in current 2Fetch cycle + public: // NEW: 2Fetch support variables /** * @brief Enable 2fetch capability */ bool enable2Fetch{true}; - /** - * @brief Whether fetched first FTQ - */ - bool has1Fetched{false}; - /** * @brief Maximum fetch bytes per cycle for 2fetch */ unsigned maxFetchBytesPerCycle{64};
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
src/cpu/o3/fetch.ccsrc/cpu/o3/fetch.hhsrc/cpu/o3/fetch.mdsrc/cpu/pred/BranchPredictor.pysrc/cpu/pred/btb/decoupled_bpred.ccsrc/cpu/pred/btb/decoupled_bpred.hhsrc/cpu/pred/btb/decoupled_bpred_stats.ccsrc/cpu/pred/btb/fetch_target_queue.ccsrc/cpu/pred/btb/fetch_target_queue.hh
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2026-01-12T02:49:08.514Z
Learnt from: jensen-yan
Repo: OpenXiangShan/GEM5 PR: 686
File: src/cpu/pred/btb/btb_mgsc.cc:226-230
Timestamp: 2026-01-12T02:49:08.514Z
Learning: In src/cpu/pred/btb/btb_mgsc.cc, the calculateScaledPercsum function returns the unscaled percsum (weight scaling disabled) to align with RTL implementation. The RTL currently does not have a weight table, so setting weight to 1 (equivalent to disabling the weight table) maintains hardware/software consistency with minimal performance impact.
Applied to files:
src/cpu/pred/btb/decoupled_bpred.cc
🧬 Code graph analysis (2)
src/cpu/o3/fetch.cc (1)
src/cpu/o3/fetch.hh (2)
predictedBranch(522-531)predictedBranch(522-522)
src/cpu/pred/btb/decoupled_bpred.cc (2)
src/cpu/pred/btb/decoupled_bpred.hh (7)
pc(1041-1041)target_to_fetch(854-854)current_pc(1037-1037)fetchTargetQueue(510-513)fetchTargetQueue(515-518)fetchTargetQueue(520-523)fetchTargetQueue(524-527)src/cpu/pred/btb/test/decoupled_bpred.cc (2)
processFetchTargetCompletion(343-361)processFetchTargetCompletion(344-344)
🪛 Ruff (0.14.10)
src/cpu/pred/BranchPredictor.py
1185-1185: Param may be undefined, or defined from star imports
(F405)
1186-1186: Param may be undefined, or defined from star imports
(F405)
1187-1187: Param may be undefined, or defined from star imports
(F405)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Quick Build, Unit Tests & Smoke Test
- GitHub Check: perf_test / XS-GEM5 - Run performance test (spec06-0.3c)
🔇 Additional comments (12)
src/cpu/o3/fetch.cc (1)
2071-2091: LGTM - Clean refactoring of the fetch loop termination condition.The replacement of the inline condition with
shouldStopFetchThisCycle(predictedBranch)properly centralizes the fetch stopping logic. The method infetch.hhcorrectly handles:
waitForVsetvlgating- Decoupled frontend
usedUpFetchTargetscheckpredictedBranchfor traditional frontendsThis sets a good foundation for future 2Fetch extensions.
src/cpu/pred/BranchPredictor.py (1)
1184-1187: LGTM - New 2Fetch configuration parameters are well-defined.The new parameters follow gem5 conventions:
- Features disabled by default (
False) ensures backward compatibilitymaxFetchBytesPerCycle=64is a sensible default matching typical fetch buffer sizes- The static analysis warnings about
Paramare false positives since it's imported via the star import fromm5.paramsat line 29src/cpu/o3/fetch.hh (1)
517-531: LGTM - Well-documented centralized fetch stop decision.The
shouldStopFetchThisCycle()method cleanly encapsulates the per-cycle fetch termination logic with:
- Clear documentation explaining the three cases
- Proper priority ordering (waitForVsetvl → decoupled frontend → predictedBranch)
- Inline implementation appropriate for this lightweight decision function
This consolidation improves maintainability and prepares the codebase for 2Fetch extensions.
src/cpu/o3/fetch.md (2)
344-353: Documentation accurately reflects the code changes.The updated documentation correctly shows:
- The new
shouldStopFetchThisCycle(bool predictedBranch)signature- The three-way logic (waitForVsetvl → decoupled frontend → predictedBranch)
- Consistent with the implementation in
fetch.hhThis keeps the documentation in sync with the refactored code.
584-586: Code example updated to match implementation.The fetch loop condition example now correctly uses
!shouldStopFetchThisCycle(predictedBranch), maintaining consistency between documentation and actual code.src/cpu/pred/btb/fetch_target_queue.hh (1)
160-181: Well-designed 2Fetch lookahead API.The new methods provide a clean interface for 2Fetch traversal:
hasNext()/peekNext()/advance()follow a standard lookahead pattern- Returning
const FtqEntry&frompeekNext()prevents unintended modificationspeekNext()properly asserts the precondition viaassert(hasNext()), ensuring safe usesrc/cpu/pred/btb/decoupled_bpred_stats.cc (1)
508-514: LGTM! Statistics for 2Fetch feature are well-structured.The new statistics cover all decision paths in the 2Fetch logic: attempts, successes, and specific rejection reasons. This provides good observability for tuning and debugging the feature.
src/cpu/pred/btb/fetch_target_queue.cc (1)
301-319: LGTM!advance()correctly updates supply state for 2Fetch.The method properly updates
supplyFetchTargetStateto point to the current demand target (which has already been incremented byfinishCurrentFetchTarget()). The fallback to invalidate supply state when the entry doesn't exist is appropriate.src/cpu/pred/btb/decoupled_bpred.cc (2)
1401-1446: LGTM!canExtendToNextFTQhas well-structured validation.The method correctly validates all preconditions for 2Fetch extension:
- First block must end with a taken branch (to jump within the fetch buffer)
- Next FTQ entry must be available
- Current PC must match next FTQ's start (ensuring continuity)
- Total span must fit within
maxFetchBytesPerCycleThe early exit checks are ordered efficiently and each rejection path records a specific statistic for observability.
1456-1477: LGTM!extendToNextFTQcorrectly sets up for processing the second FTQ entry.The method properly:
- Advances to the next FTQ entry via
fetchTargetQueue.advance()- Resets the instruction counter for the new FTQ entry
- Updates the PC to the start of the new FTQ entry
- Records the successful 2Fetch metric
src/cpu/pred/btb/decoupled_bpred.hh (2)
383-389: LGTM! Statistics declarations are well-documented.The statistics cover all decision paths in the 2Fetch logic with clear documentation comments explaining each counter's purpose.
1029-1041: LGTM! Method declarations are appropriate.The method signatures correctly use const references where modification isn't needed and non-const reference for the PC that gets updated.
| // Check if 2fetch is enabled, not fetched first FTQ yet, and if we can extend to the next FTQ | ||
| // NEW: 2Fetch extension check - before processing completion | ||
| dbpBtbStats.fetch2Attempts++; | ||
| if (enable2Fetch && !has1Fetched && canExtendToNextFTQ(pc, target_to_fetch)) { | ||
| DPRINTF(DecoupleBP, "2Fetch: extending to next FTQ in same cycle\n"); | ||
| has1Fetched = true; | ||
| processFetchTargetCompletion(target_to_fetch); | ||
| extendToNextFTQ(pc); | ||
| // first fetchBlock is always taken, do not run out of FTQ now | ||
| return std::make_pair(true, false); | ||
| } | ||
|
|
||
| processFetchTargetCompletion(target_to_fetch); | ||
| has1Fetched = false; // reset 2fetch flag | ||
| } |
There was a problem hiding this comment.
fetch2Attempts is incremented even when 2Fetch is disabled or already used.
The counter fetch2Attempts is incremented unconditionally at line 445, before checking enable2Fetch and !has1Fetched. This means:
- Attempts are counted even when
enable2Fetchis false - The second FTQ completion in a 2Fetch cycle (when
has1Fetched=true) also counts as an attempt
Consider moving the increment inside the condition, or after the early-exit checks in canExtendToNextFTQ.
🐛 Suggested fix
if (run_out_of_this_entry) {
// Check if 2fetch is enabled, not fetched first FTQ yet, and if we can extend to the next FTQ
// NEW: 2Fetch extension check - before processing completion
- dbpBtbStats.fetch2Attempts++;
- if (enable2Fetch && !has1Fetched && canExtendToNextFTQ(pc, target_to_fetch)) {
+ if (enable2Fetch && !has1Fetched) {
+ dbpBtbStats.fetch2Attempts++;
+ if (canExtendToNextFTQ(pc, target_to_fetch)) {
DPRINTF(DecoupleBP, "2Fetch: extending to next FTQ in same cycle\n");
has1Fetched = true;
processFetchTargetCompletion(target_to_fetch);
extendToNextFTQ(pc);
// first fetchBlock is always taken, do not run out of FTQ now
return std::make_pair(true, false);
+ }
}
processFetchTargetCompletion(target_to_fetch);
🚀 Coremark Smoke Test Results
✅ Difftest smoke test passed! |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f95836f92e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| Addr span = next_ftq.endPC - current_ftq.startPC; | ||
| if (span > maxFetchBytesPerCycle) { | ||
| DPRINTF(DecoupleBP, "2Fetch rejected: span %d exceeds %d bytes\n", | ||
| span, maxFetchBytesPerCycle); | ||
| dbpBtbStats.fetch2SpanTooLarge++; |
There was a problem hiding this comment.
Gate 2fetch extension by actual fetch-buffer span
The 2fetch gate only checks next_ftq.endPC - current_ftq.startPC <= maxFetchBytesPerCycle, but it doesn’t guarantee that the next FTQ entry actually lies within the current fetch buffer. If maxFetchBytesPerCycle is configured larger than fetchBufferSize (or the next entry crosses the buffer), extendToNextFTQ returns (true, false) and leaves usedUpFetchTargets false; checkMemoryNeeds() will then stall because the PC is outside the buffer, yet needNewFTQEntry() won’t issue a new cache request, so fetch can hang. Consider clamping the span check to the fetch-buffer size or invalidating the buffer when extension spans beyond it.
Useful? React with 👍 / 👎.
- Added handling in the fetch stage to invalidate the fetch buffer and force a new FTQ entry when the architectural fetch PC exceeds the buffered FTQ window in decoupled-frontend mode. This prevents potential deadlocks during ICache stalls. Change-Id: I38a6003e136a51c115f33358a62138dc987336da
Change-Id: Iaa4a853508ae80e412a8b3bdd41dea26d9eb8c29
🚀 Coremark Smoke Test Results
✅ Difftest smoke test passed! |
🚀 Coremark Smoke Test Results
✅ Difftest smoke test passed! |
|
[Generated by GEM5 Performance Robot] Align BTB PerformanceOverall Score
[Generated by GEM5 Performance Robot] Align BTB PerformanceOverall Score
|
Change-Id: I3b112cc844c485d81cea1f4ed0ff221cb37d2782
Summary by CodeRabbit
New Features
Refactor
✏️ Tip: You can customize this high-level summary in your review settings.